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Abstract 

The  Householder -Fox  algorithm  uses  the  Cholesky  decomposition  to  calculate 
an  orthonormal  basis  for  the  range  of  a projection.  In  this  paper  it  is 
shown  that  the  algorithm  continues  to  give  good  results  when  it  is  applied 
to  an  approximate  projection  in  the  presence  of  rounding  error. 


ON  TOE  HOUSEHOLDER- FOX  ALGORITHM 
FOR  DECOMPOSING  A PROJECTION 


Cleve  B.  Moler 
G.  W.  Stewart 

1.  Introduction 

A real  orthogonal  projection  is  a real  matrix  A satisfying  the 
following  two  conditions: 

T 

1.  A = A (symmetry) , 

(1.1)  2 

2.  A = A (idempotence) . 

Applied  to  a vector  x such  a matrix  produces  the  orthogonal  projection 
Ax  of  x onto  the  column  space  of  A (denoted  by  R(A));  that  is, 

Xj  = Ax  and  x2  = (I-A)x  are  the  unique  vectors  satisfying 

1.  x = Xj^  + x2, 

(1.2)  2.  x1  i R(A), 

3.  x^  x x2* 

The  conditions  (1.2)  are  easily  seen  to  follow  from  (1.1). 

In  some  applications  one  is  given  a projection  A and  wishes  to  find 
an  orthonormal  basis  for  the  subspace  R(A).  For  example,  if  A is  known 
to  be  of  low  rank,  say  rank  (A)  = r,  then  A can  be  represented  economically 
in  the  form 

A = QQT 

where  the  r columns  of  Q form  an  orthonormal  basis  for  R(A).  The 


I 


savings  in  storage  can  be  substantial  if  the  order  n of  A is  very  much 

2 

greater  than  r;  for  A requires  n /2  locations  for  its  storage  while 
Q requires  only  nr.  Projections  of  low  rank  arise  in  the  study  of  the 
spectra  of  molecules  with  high  degress  of  synmetry  (cf.  the  work  of  Fox 
and  Krone  [3]). 

One  method  for  computing  Q is  to  apply  various  orthogonal izing  . 

techniques  to  the  columns  of  A.  For  example,  one  might  use  Householder 

transformations  with  column  pivoting  to  compute  a QR  factorization  of  A ' 

•j 

[5,7],  However,  these  techniques  do  not  preserve  the  symmetry  of  A. 

Moreover,  there  is  considerable  evidence  that  when  A is  sparse,  ortho- 

gonalization  methods  can  lead  to  excessive  fill-in  [2].  j 

A method  which  is  symmetry  preserving  is  to  calculate  the  eigensystem  j 

of  A [6],  The  eigenvalues  of  A must  be  either  zero  or  unity,  and  the  j 

eigenvectors  corresponding  to  the  eigenvalue  unity  form  a basis  for  R(A) . I 

However,  the  method  suffers  from  fill-in  problems,  and  does  not  directly 

use  the  idempotency  of  A.  j 

Householder  and  Fox  [4]  have  observed  that  the  Cholesdky  factorization 
of  a projection  gives  the  required  basis  for  R(A)  directly.  The  form 

of  the  Cholesky  decomposition  used  here  is  stated  in  the  following  theorem,  1 

whose  proof  is  usually  a constructive  technique  for  calculating  it  (see  §3) . 

Theorem  1.1.  Let  A be  a positive  semi-definite  matrix  of  order  n 
and  rank  r.  Then  there  i?  a permutation  matrix  P and  an  n * r lower  j 

trapezoidal  matrix  of  rank  r such  that  ] 


T T 

PAP  = LL1. 
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The  process  by  which  the  rows  and  columns  of  A are  rearranged, 
i.e.,  the  manner  in  which  P is  chosen,  is  called  pivoting.  For  the 
present  we  shall  assume  that  the  pivoting  has  been  done  initially  and 
suppress  mention  of  the  matrix  P.  We  shall  return  to  the  role  of  pivot- 
ing in  the  final  section  of  this  paper. 

The  importance  of  the  Cholesky  decomposition  for  our  purposes  is 
contained  in  the  following  corollary. 

Corollary  1.2.  Suppose,  in  addition  to  the  hypotheses  of  Theorem 
1.1,  that  A2  = A.  Then 

LTL  = I. 

T 

Proof.  From  the  relation  A = LL  , it  follows  that 


(1.3) 


LLTLLT  = A2  = A = LLT. 


Since  the  columns  of  L are  independent,  L has  a pseudo- inverse 
L+  = (LTL)  *LT  satisfying  iA  = I.  Then  from  (1.3) 

LTL  = L+(LLTLLT)L+T  = Lt(LLT)L+T  = I.  □ 

The  inport  of  the  corollary  is  that  the  columns  of  L are  orthonormal. 
They  of  course  span  R(A);  hence  the  columns  of  L form  the  required  basis. 
However , in  practice  the  algorithm  must  be  used  in  the  presence  of  errors 
of  various  sorts,  and  it  is  the  purpose  of  this  paper  to  show  that  one  can 
still  expect  to  obtain  good  results. 


r 
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2.  Assessment  of  the  Final  Results 

There  are  two  sources  of  error  in  the  use  of  the  Householder- Fox 
algorithm.  First  the  matrix  A may  not  be  exactly  idempotent  (in  most 
applications  the  symmetry  of  A is  forced  by  other  considerations) . We 
summarize  this  state  of  affairs  by  writing 

(2.1)  A2  * A + F, 

where  the  symmetric  matrix  F is  presumed  small. 

The  second  source  of  error  is  the  rounding  error  made  in  the  course 
of  the  Cholesky  reduction  of  A.  The  effects  of  rounding  error  will  be 
investigated  in  more  detail  in  §3.  For  the  present  we  will  make  the  rea- 
sonable assumption  that  the  computed  matrix  L satisfies  a stability 
requirement  of  the  form 

(2.2)  LLT  = A + E, 

where  E is  a small  matrix  of  order  rounding  error  (cf.  Theorem  3.2  below). 
Assuming  (2.1)  and  (2.2),  we  shall  in  this  section  give  answers  to  the 
following  two  questions: 

1.  How  near  are  the  columns  of  L to  orthonormality? 
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and  the  spectral  matrix  norm  defined  by 


IIA||  * sup  ||Ax  || . 

11x11=1 

When  A is  symmetric,  its  spectral  norm  is  the  maximum  of  the  absolute 

2 T 

values  of  the  eigenvalues  of  A.  Also  for  any  matrix  X,  ||X||  = ||X  X||. 

We  begin  our  development  by  locating  the  eigenvalues  of  the  matrix  A 
which  for  the  rest  of  this  paper  is  assumed  to  be  symmetric.  The  eigen- 
values of  a projection  can  be  only  zero  and  unity,  and  Theorem  2.1  general- 
izes this  fact  by  showing  that  an  approximate  projection  in  the  sense  of 
(2.1)  must  have  eigenvalues  clustering  about  zero  and  unity. 

Theorem  2.1.  Let  A satisfy  (2.1).  Then  the  eigenvalues  of  A 
lie  in  one  of  the  two  intervals 


(2.4) 


1 + 


In  particular 


(2.5) 


IIAJI  < 1 + ||F|! . 


2 2 

Proof.  The  eigenvalues  of  A - A are  X - X,  where  X is  an  eigen - 
2 

value  of  A.  Since  A - A * F,  the  eigenvalues  of  A must  satisfy 
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X2  - X € HIFIIJFH], 

which  is  equivalent  to  saying  that  X lies  in  one  of  the  two  intervals 
(2.3)  or  (2.4).  The  largest  eigenvalue  of  A cannot  be  larger  than  the 
right  hand  end  of  the  interval  (2.4),  which  is  bounded  by  1 + ||F||.  This 
establishes  (2.5).  □ 

Asymptotically  for  small  F the  intervals  (2.3)  and  (2.4)  reduce 
to  HIFIIJFH]  and  [1-||F||,1+||F||] . 

If  A is  a projection,  then  so  is  I - A.  If  A is  an  approximate 
projection  in  the  sense  of  (2.1),  then 

(I -A) 2 = I - 2A  + A2  * (I -A)  + F. 

Hence  (I -A)  is  an  approximate  projection,  and  from  Theorem  2.1  we  have  the 
following  bound: 

Ill-All  < 1 + ||F||. 

We  are  now  in  a position  to  answer  the  first  of  our  questions. 

Theorem  2.2.  Let  the  matrix  A satisfy  (2.1)  and  let  L satisfy 
(2.2).  Suppose  that  L is  of  full  column  rank  and  satisfies 

(2.6)  e||L+||2  < y ’ 

where 

e = IIFII  + ||E||  (2+2||F||+||E||) . 

Then 
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(2.7)  ||L+!!2  < (1-2  e)  1 

odd 

(2.8)  li^L-HI  < JTfl  • 

Proof.  Froa  (2.1)  and  (2.2)  it  follows  that 

LLTLLT  = (A+E)2  = A+  E + F-  E + 'EA  + AE  + f}. 

Hence 

(2.9)  LTL  = I + L+[F+E(A-I)+AE+E2]LTt. 

It  follows  from  (2.9)  and  (2.6)  that 

IILTL-I||  <\. 

T 

In  particular  no  eigenvalue  of  L L can  be  less  than  or  equal  to  1/2, 

+ ? 

from  which  it  follows  that  ||L  ||  <2,  Again  from  (2.9) 

||LTL-I||  < 2e, 

and  from  this  the  bound  (2.7)  follows.  Finally  applying  (2.7)  to  (2.9) 
gives  (2.8).  □ 

The  condition  (2.6)  is  a requirement  that  the  columns  of  L be  inde- 
pendent. It  is  not  very  strong,  and  if  it  is  satisfied  it  implies  that  the 
columns  of  L are  almost  orthonormal  in  the  sense  of  the  inequality  (2.8), 
whose  right  hand  side  is  essentially  ||F||  + 2||E||.  Indeed  the  theorem  may  be 
interpreted  as  saying  that  the  columns  of  an  LL  decomposition  of  A 


cannot  be  slightly  independent  without  being  completely  so. 


Our  second  question  amounts  to  asking  if,  having  obtained  L,  we 
have  obtained  something  useful.  This  of  course  will  depend  on  what  we 
originally  desired  to  compute;  however,  in  most  applications  we  are 
seeking  a basis  for  the  column  space  of  an  exact  projection  which  we 
believe  to  be  near  A.  Now  any  matrix  satisfying  (2.1)  divides  n-space 
naturally  into  two  complementary  subspaces.  They  are  the  subspace  A^ 
spanned  by  the  eigenvectors  associated  with  the  eigenvalues  clustered 
about  unity  and  the  subspace  Aq  spanned  by  the  eigenvectors  associated 
with  the  eigenvalues  clustered  about  zero.  These  subspaces  are  orthogonal 
complements,  and  because  the  eigenvalues  associated  with  the  two  subspaces 
are  well  separated,  they  are  insensitive  to  small  perturbations  of  A 
(see  [1]  for  further  details).  It  follows  that  A^  must  be  a good 
approximation  to  the  column  space  of  any  projection  near  A. 

We  should  like  to  show  that  R(L)  is  a good  approximation  to  A^. 

We  shall  do  this  indirectly  by  showing  that  the  columns  of  L are  almost 
orthogonal  to  A^.  Since  the  columns  of  L are  almost  orthonormal  R(L) 
must  be  almost  orthogonal  to  Aq  and  cannot  help  being  a good  approxima- 
tion to  A^. 

Theorem  2.3.  Under  the  hypotheses  of  Theorem  2.2,  if  for  any  vector 
x with  ||x||  = 1 

l|Ax||  = 5, 


then 
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IIITX||  5 SiM 

AT71 


Proof.  From  (2.2)  we  have 


Hence 


LL  x = (A+E)x  = Ax  + Ex. 


LTx  = L+(Ax+Ex) 


and 


||LTx||  5 ||L+||  (||Ax||+||E||  ||x|i)  = 

/Fie 


It  should  be  pointed  out  that,  having  obtained  L,  one  can  approximate  the 

T 

projection  onto  AQ  by  I - LL  . If  the  dimension  of  AQ  is  very  much  less 

than  that  of  Ap  it  will  pay  to  decompose  I - A to  obtain  an  L spanning 

Aq  that  has  fewer  columns.  The  projection  for  A^  can  then  be  repre- 
T 

sented  as  I - LL  (however,  some  care  must  be  taken  to  insure  the  ortho - 

T T 

gonality  of  the  computed  projections  (LL  )x  and  (I-LL  )x). 


3.  The  Effects  of  Rounding  Error  and  the  Role  of  Pivoting 

The  size  of  the  matrix  E that  describes  the  effects  of  rounding 
error  on  the  computation  has  played  an  important  role  in  the  last  section. 
In  this  section  we  shall  give  reasons  for  expecting  E to  be  quite  small. 
The  analysis  also  makes  clear  the  role  of  pivoting  in  computing  the  decom- 
position. 


t 

\ We  begin  with  a detailed  description  of  the  Cholesky  algorithm  in  its 
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"exterior  product"  form.  The  algorithm  proceeds  in  stages.  At  the  k-th 
stage  A has  been  decomposed  in  the  form 


A = LkLk  + Bk, 


where  Lk  has  k columns  and  Bk  has  the  form 


(3.1) 


C 


>(k) 

22 


(k)  fk") 

with  B^2  of  order  n-k.  Denoting  by  b£  1 the  k-th  column  of  Bk  and 

by  Pkk^  the  (k,k) -element  of  Bk  we  form 


Vi 


B, 


b[kMk)T 

k k 


P 


w 

kk 


and 


\ 


he*!  ' Lk’ 


It  is  easily  verified  that  Lj^  is  lower  trapezoidal,  that  Bk+1  is 

zero  except  for  its  trailing  principal  minor  of  order  n-k-1,  and  that 
T 

A s hc+ihc+i + Bk+r  Thus  the  decomposition  is  advanced  one  stage.  The 

algorithm  terminates  when  some  Bk  is  negligible. 

The  algorithm  cannot  be  carried  out  in  the  form  described  above  if 
(k) 

Bkk  *s  not  positive.  However,  in  this  event  it  may  happen  that  there 

is  an  integer  «k  > k such  that  p^k)  is  positive.  Let  Pk  denote  the 

k Ic 


1 


11 


permutation  matrix  obtained  by  interchanging  rows  k and  ^ of  the 
identity  matrix  and  consider  the  decomposition 

pA  ■ <wT  * vpl  ■ 

and  the  matrix 

T 

still  has  the  form  (3.1).  However,  the  (k,k)-th  element  of  P^B^P^  is 

(kl  T 

j , and  the  decomposition  of  P^AP^  can  proceed  as  usual.  This  process 

k k 

of  interchanging  an  acceptable  element  into  the  (k,k)-th  position  of 
is  the  pivoting  process  mentioned  in  §1. 

It  is  still  conceivable  that  no  diagonal  element  of  is  positive. 

We  shall  show  that  this  is  not  likely  to  happen  unless  is  itself  negli- 

gible. We  begin  by  proving  a theorem  about  the  diagonal  elements  of  nearly 
idempotent  matrices. 

Theorem  3.1.  Let  the  symmetric  matrix  A of  order  n satisfy 

2 

A = A + F,  where 


The  matrix  P,L  is  still  lower  trapezoidal. 


r 


1-/T4M  J_ 
2 < 2n 


Then  either  || All  s r or  there  is  a diagonal  element  a.,  of  A that  satisfies 


(3.2) 


T > 


1 

2n  ‘ 


Proof.  As  was  observed  in  Theorem  2.1,  the  eigenvalues  of  A lie  in 
the  nonoverlapping  intervals  [-r,r]  and  [l-r,l+r].  By  perturbing  the 
eigenvalues  in  the  first  interval  to  zero  and  those  in  the  second  interval 
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to  unity  we  obtain  a matrix  A + G whose  eigenvalues  are  either  zero  or 
unity;  i.e.,  A + G is  a projection.  Moreover,  ||G||  £ y.  Now  if  IIAII  > y, 
then  one  of  the  eigenvalues  of  A must  lie  in  the  interval  [l-y,l+y], 
and  A + G must  have  unity  for  an  eigenvalue.  The  trace  of  A + G is  the 
sum  of  the  eigenvalues  of  A + G;  hence 


i=l 


( aAi+r^)  5 1- 


n 'll 


Thus  there  is  a diagonal  element  + y^  of  A + G satisfying 


(3.3) 


a.  • + y.  . > — 

li  li  n 


Since  y^  < ||G||,  (3.3)  implies  (3.2).  d 

It  must  be  noted  that  the  term  1/n  in  (3.2)  is  an  extreme  lower  bound 
and  can  be  replaced  by  p/n,  where  p is  the  number  of  eigenvalues  of  A 
in  the  interval  [1-y.l+y]. 

Theorem  3.1  shows  that  there  is  always  a reasonable  pivot  element  to 

start  the  reduction.  To  show  that  it  can  be  completed,  we  shall  show  that 

the  matrices  are  also  nearly  idempotent , after  which  Theorem  3.1 

applies  to  give  us  the  required  pivot  element.  In  addition  to  the  usual 
2 

assumption  that  A = A + F,  we  shall  take  account  of  rounding  error  by 
supposing  that  the  computed  and  satisfy 

LkHc  + h = A + Ek* 

For  notational  convenience  we  shall  drop  the  subscripts  k during  the 
analysis. 


- 13  - 

Let  B be  partitioned  as  in  (3.1),  and  let  A,  E,  and  L be  parti- 
tioned conformally: 


= (A^  ,A2)  1 


All 

A12 

A21 

A22 

E11 

E12 

E21 

E22 

= (ErE2): 


Assume  the  is  nonsingular  and  set 


x = lILf  II. 


Hence 


LL*  = A1  + E1< 


L1L  LLj  = aJax  + aJex  + eJax  + eJex 


^All+Ell-)  + F11  ' E11  + A1E1  + E1A1  + E1E1 


(3.4)  LTL  * I + L'1(Fn-E11+A^E1+EjA1+EjE1)L^1  = I + G, 
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where 

UGH  < X2  [ ||F ||+  HE II (3+  2 ||F  11+  ||E It) ] . 
It  also  follows  from  (3.4)  that 

||LTL||  < 1 + IIGil 


and 


||L||  <1  + 7 ||G||. 


A 


■i 


We  next  obtain  a bound  for  L - AL.  We  have 

ALL^  = AfAj+Ej)  = Ax  + F + AEX 
= (A1+E1)  + Fx  + (A-I)E1. 

Hence 

AL  « L + [Fj+  (A-I)Ej]L^ 

j 3 L + H, 

where 

||H||  < X[||F||+  ||E||(1+||F||)]. 

T 

Finally  since  B * A + E - LL  , 


I 


where 


- IS  - 


B2  * A2  - ALL1  - LLTA  + LLTLLT  + AE  + EA  + E2  - ELLT  - LLTE 


= (A+E)  - (L+H)LT  - L(LT+H)  + L(I+G)LT 


+ F + (A-I)E  + EA  + E2  - ELLT  - LLTE 


B - HLT  - LTH  + LGLT  + F + (A-I)E  + EA  + E2 


T T 
- ELL  - LL  E 


= B + K, 


||K||  < ||H||(2+||G||)  + !|G||(1+||G||) 


||F||  + l|E  II  (4+  ||F  ||+  ||E  ||+ 2 ||G  I!) . 


If  we  ignore  terms  of  the  second  order  in  the  bound  for  ||K||  we 
obtain  the  asymptotic  bound 


HKII  < \‘(||F||+3||E||)  ♦ 2X(||F||+||E||)  + (||F||+  4||E||), 


in  which  the  first  term  will  generally  dominate.  Since  L^L|  = A^  + E^j, 
the  number  X is  an  estimate  of  1|Ajj|).  This  explains  the  role  of  pivot- 
ing in  the  algorithm.  Not  only  is  pivoting  necessary  to  insure  that  one 
stage  of  the  algorithm  can  be  carried  out,  but  it  is  also  necessary  to 
keep  small  diagonal  elements  from  appearing  in  L^.  For  if  this  unhappy 
circumstance  occurs , then  X must  be  large  and  we  cannot  guarantee  the 
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successful  conclusion  of  the  algorithm.  Note,  however,  that  if  E and 
F are  small  we  can  hope  to  tolerate  rather  small  diagonal  elements,  which 
gives  us  considerable  freedom  in  the  choice  of  pivot  elements. 

We  have  not  yet  given  a quantitative  assessment  of  the  effects  of 
rounding  error  on  our  algorithm.  We  cite  a well  known  theorem  [7,8]. 

Theorem  3.2.  Let  the  algorithm  described  above  be  carried  out  in 
t-digit,  binary  floating-point  arithmetic.  Let 

Pk  = max  {p^:  i.j  * l,...,n;  1*1,.. . ,k-l>. 

Then 

l!EkH  2 

The  function  f(n)  depends  on  the  details  of  the  arithmetic  used; 

2 

but  it  is  certainly  less  than  0(n  ) with  a modest  order  constant.  The 
critical  factor  is  the  number  Pk,  which  measures  the  growth  of  the  elements 
. Since  0k  < 1 + 

show  that,  provided  we  have  maintained  a reasonable  degree  of  nonsingu- 

fkl 

larity  in  the  matrices  L^  J , rounding  error  should  have  a negligible 
effect  on  the  algorithm. 

To  summarize,  this  is  a remarkably  stable  algorithm.  Although  we 
cannot  guarantee  that  the  l|^  will  have  small  inverses,  we  think  that 
it  is  extremely  unlikely  that  anything  untoward  will  happen  if  a reason- 
able pivoting  strategy  is  adopted.  The  cautious  user  can  monitor  the 


of  the  matrices 


IIK^H,  the  above  analysis  applies  to 
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i 

i. 

■ 

i 

i 

i 

i 

i 


F 


> 


i 


I 


L 


as  the  B^  are  computed,  after  which  Theorem  3.2  and  Theorem  2.2 
will  enable  him  to  assess  his  results.  A particularly  attractive  feature 
of  the  algorithm  is  the  latitude  in  pivoting  strategies  that  the  bound 
on  ||K||  suggests  are  available  to  the  user.  For  example,  the  user  might 
compromise  the  size  of  his  pivots  to  preserve  sparsity  in  very  large 
problems  and  hope  to  get  away  with  it.  Experiments  by  Fox  and  Krohn 
[3]  in  which  the  pivot  order  is  fixed  initially  tend  to  confirm  this 
view. 
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